Clinical trial results in context: comparison of baseline characteristics and outcomes of 38,510 RECOVERY trial participants versus a reference population of 346,271 people hospitalised with COVID-19 in England

Background Randomised trials are essential to reliably assess medical interventions. Nevertheless, interpretation of such studies, particularly when considering absolute effects, is enhanced by understanding how the trial population may differ from the populations it aims to represent. Methods We compared baseline characteristics and mortality of RECOVERY participants recruited in England (n = 38,510) with a reference population hospitalised with COVID-19 in England (n = 346,271) from March 2020 to November 2021. We used linked hospitalisation and mortality data for both cohorts to extract demographics, comorbidity/frailty scores, and crude and age- and sex-adjusted 28-day all-cause mortality. Results Demographics of RECOVERY participants were broadly similar to the reference population, but RECOVERY participants were younger (mean age [standard deviation]: RECOVERY 62.6 [15.3] vs reference 65.7 [18.5] years) and less frequently female (37% vs 45%). Comorbidity and frailty scores were lower in RECOVERY, but differences were attenuated after age stratification. Age- and sex-adjusted 28-day mortality declined over time but was similar between cohorts across the study period (RECOVERY 23.7% [95% confidence interval: 23.3–24.1%]; vs reference 24.8% [24.6–25.0%]), except during the first pandemic wave in the UK (March–May 2020) when adjusted mortality was lower in RECOVERY. Conclusions Adjusted 28-day mortality in RECOVERY was similar to a nationwide reference population of patients admitted with COVID-19 in England during the same period but varied substantially over time in both cohorts. Therefore, the absolute effect estimates from RECOVERY were broadly applicable to the target population at the time but should be interpreted in the light of current mortality estimates. Trial registration ISRCTN50189673- Feb. 04, 2020, NCT04381936- May 11, 2020. Supplementary Information The online version contains supplementary material available at 10.1186/s13063-024-08273-9.

Therefore, the absolute effect estimates from RECOVERY were broadly applicable to the target population at the time but should be interpreted in the light of current mortality estimates.

Background
Randomised controlled trials (RCTs) are essential to reliably evaluate safety and efficacy of health interventions [1,2].The use of randomisation (with allocation concealment) minimises the risk of bias, but, inevitability, due to eligibility criteria, trial participants are rarely representative of the populations whose treatment they aim to inform.Nonetheless, the proportional estimates of treatment effects from the trial are usually generalisable to the broader population, unless there are good grounds for believing there may be systematic differences in the effectiveness of the intervention or in the biology of the target disease outside of the trial setting (e.g. the advent of a new variant that renders a pathogen resistant to the particular drug that was studied) [3].However, the estimates of absolute harm and benefit generated by such trials may not be directly generalisable, and assessment of the absolute rates of the relevant outcomes in the target population is useful to understand the likely absolute effects of the intervention in clinical practice [4,5].
The Randomised Evaluation of COVID-19 Therapy (RECOVERY) trial is a randomised, controlled, openlabel, pragmatic, platform trial of potential therapies for patients hospitalised with COVID-19 [6].Eligibility criteria were broad and simple (i.e.hospitalisation for suspected or confirmed COVID- 19), and trial procedures streamlined to be feasible in local practice.Data collection by trial staff, using dedicated case report forms (CRF), focused on the minimum information needed and was complemented with extensive linkage to several healthcare systems data sources in the UK.The trial took place in all acute UK National Health Service (NHS) hospitals, and in several other countries globally.
Here, we aimed to compare the baseline characteristics (demographics and comorbidities) and all-cause 28-day mortality (the trial primary outcome) for RECOVERY participants with a reference population hospitalised with COVID-19, within England.

RECOVERY cohort
The RECOVERY trial design has been described previously [6].Briefly, RECOVERY recruited patients admitted to hospital with confirmed or suspected COVID-19 who were considered suitable for inclusion by their attending clinical team.Recruitment was not targeted to any particular subgroups or aimed at achieving a representative sample of the target population; the aim was to recruit a large number of participants rapidly.Randomisation was performed via a short online CRF in which essential baseline data are collected.Followup data were collected using a simple CRF upon death, hospital discharge, or at 28 days from randomisation (whichever occurs sooner).In the UK, these data were complemented with linkage to national healthcare systems data sources.The protocol, data analysis plan, baseline characteristics and outcome derivation documentation, and published results are openly available at www. recov erytr ial.net, and the trial is registered with ISRCTN (50189673) and ClinicalTrials.gov (NCT04381936).Written informed consent was obtained from all the patients or from a legal representative if they were unable to provide consent.The RECOVERY trial has been approved by the UK Medicines and Healthcare products Regulatory Agency and the Cambridge East Research Ethics Committee (reference 20/EE/0101).
For this analysis, we included all RECOVERY participants recruited in England who had not withdrawn consent and had available healthcare systems data on hospital admissions (Hospital Episode Statistics [HES]) [7], with or without mortality data from official death records (Civil Registrations) [8].We excluded children aged < 16 years due to difficulties in accessing linked healthcare systems data in this group in RECOVERY.HES data contained information on admissions to all NHS hospitals in England (using standardised coding practices since the 1990s), namely admission and discharge dates and relevant diagnostic and procedure codes.Diagnostic codes are recorded using the International Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) clinical terminology and can be assigned a position from 1 to 20; codes in position 1 usually indicate the primary cause of admission (or main cause of extension of hospital stay) [9].Civil Registrations included information on date of death and underlying and contributing causes of death (also coded using ICD-10).HES and Civil Registrations were linked and supplied by NHS England [10].

Reference population
To derive a reference population of people hospitalised with COVID-19 in England (thus potentially eligible for RECOVERY), we used an anonymised database covering the entirety of England which includes linked HES and Civil Registrations data continuously collected since 1999.These data were linked and supplied by NHS England, and are analysed at the University of Oxford [11].More information can be found in the NHS England Data Uses Register at http:// digit al. nhs.uk/ servi ces/ data-access-reque st-servi ce-dars/ datauses-regis ter (reference: DARS-NIC-315419-F3W7K). Approval for the use of the datasets was provided by the Central and South Bristol Research Ethics Committee (ref 04/Q2006/176).
The reference population was ascertained based on the presence of a COVID-19 ICD-10 code (U071-'COVID-19, Virus identified' , or U072-'COVID-19, Virus not identified') [12].This approach was informed by preliminary cross-validation work (Additional file 1: Annex III) using linked HES and SARS-CoV-2 testing data for RECOVERY participants, which showed 92% of RECOVERY participants recruited in England with a positive SARS-CoV-2 test (as captured in NHS England's COVID-19 Second Generation Surveillance System-SGSS dataset) [13] had an admission in the HES data which included one of these codes in the primary diagnostic position.We therefore restricted our reference population to individuals with relevant ICD-10 codes in the primary position to avoid inclusion of people in whom COVID-19 was not the main reason for care.The RECOVERY cohort is largely contained within the reference population, but given the anonymised nature of the national datasets it was not possible to identify them.

Analysis period
For each individual in RECOVERY and the reference population, we assigned an index date as the start of the earliest HES episode with U071/U072 in the first diagnostic position.For RECOVERY participants with index dates before 1 March 2020 (indicating long episodes before inclusion in the study; n = 22) or no COVID-19 codes in their HES records (n = 1465), we used randomisation date as the index date.We then restricted our analysis period to index dates between 1 March 2020 and 30 November 2021 inclusive.These analyses were not extended beyond this time-point as the launch of the high-dose dexamethasone comparison in the UK (only suitable to patients with oxygen or ventilation requirements) resulted in more selected patient populations being included in the trial [14].

Baseline characteristic and outcomes
We used HES data in both cohorts to extract baseline clinical characteristics and demographics including age, sex, ethnicity, deprivation (quintile of Index of Multiple Deprivation 2019) [15], geographical location, Charlson Comorbidity Score [16,17] and its components, and Hospital Frailty Risk Score [18].Comorbidities were defined as the presence of a relevant ICD-10 code in any diagnostic position recorded within 5 years before the index date (i.e.excluding the index episode).Further methodological details, including the ICD-10 codes used, are provided in Additional file 1: Annex I. Geographical location data (including for deprivation assessments) were extracted from HES records and ascertained from full postcode in the RECOVERY HES data and lower-super output area of the postcode in the national HES data.
For outcomes, we calculated all-cause mortality within 28 days using linked HES and Civil Registrations data.Ascertainment of fact and date of death was based on these linked data sources (derivation methodology described elsewhere) [19].We considered death records occurring in either healthcare systems data source.We ignored reports of deaths of RECOVERY participants recorded only on the CRF data as there were no CRF data for the reference population.

Statistical analyses
This analysis is limited to RECOVERY participants in England with available HES data.To assess how this selection may have affected the cohort characteristics, we first compared the characteristics of those recruited in England with those recruited in other UK nations (using CRF data for all characteristics except ethnicity, and healthcare systems data in each nation for ethnicity).We then compared the characteristics of RECOVERY participants recruited in England who had available HES data with those who did not (using CRF data for all characteristics except ethnicity, for which we used healthcare systems data from primary care).
We compared baseline characteristics and 28-day mortality of the RECOVERY cohort with those of the reference population, in each case restricted to England only.Age was stratified into 4 groups: < 60, 60-69, 70-79, and ≥ 80.We presented continuous parameters as mean with standard deviation (SD) or median with interquartile range (IQR) as appropriate (with visual assessment of frequency distribution for normality) and frequency counts and percentage distribution for categorical parameters.We compared age, sex, and region of residence by calculating a representativeness ratiodefined as the proportion of people within RECOVERY in each category divided by the proportion of people within the reference population in the same categoryand presented these along with 95% confidence intervals [95% CI] [20].We also calculated a recruitment ratio defined as number of individuals included in RECOVERY divided by the number of individuals in the reference population.We then aggregated individuals in each cohort into three-month periods and conducted the same calculations as above for each time period separately.
The primary RECOVERY trial outcome of 28-day allcause mortality was calculated starting from the index date in both cohorts, overall and over time (by threemonth periods).We presented crude and age-and sexadjusted mortality rates with 95% CI, with adjustment performed using direct standardisation methods [20] (i.e.applying RECOVERY mortality rates to the reference population age and sex composition using the age groups mentioned above).Further methodological details are provided in Additional file 1: Annex I.
We used Stata v17/MP to derive baseline characteristics and outcomes in HES and Civil Registrations data in both cohorts and R v4.2.1 for all subsequent data management, statistical analysis, and plotting (further details are provided in Additional file 1: Annex I).

Baseline characteristics
Up until 1st September 2022, RECOVERY recruited 46,010 participants, of which 44,766 in the UK and 39,952 in England.Of these, 39,304 (98.4%) had available HES data, and 38,780 were recruited within the analysis period (1 March 2020-30 November 2021).After excluding participants aged below 16 at the index date, a total of 38,510 participants were finally included in our analysis (Fig. 1).RECOVERY participants recruited in other UK nations had generally similar characteristics to those recruited in England (Additional file 1: Supplementary Table S1).People with no HES data available were younger, less frequently of white ethnicity, and had generally lower comorbidity burden and need for respiratory support at randomisation (Additional file 1: Supplementary Table S2).The reference population included 346,271 individuals (Fig. 1); for every 100 people admitted with COVID-19 in England, 11 participants were recruited to RECOVERY.When considering geographical region, the proportion of relevant patients recruited to RECOVERY in London, West Midlands, and Yorkshire and The Humber was lower than in the other England regions (Fig. 2).
Table 1 shows the baseline characteristics of both cohorts.RECOVERY participants were less frequently female (RECOVERY 37% vs reference population 45%) and were on average slightly younger than the reference population (mean age [SD]: 62.6 [15.3] vs 65.7 [18.5] years), with people aged 80 + and women underrepresented in RECOVERY throughout the analysis period (Additional file 1: Supplementary Figure S1 and Supplementary Table S3).RECOVERY participants were more frequently of White background (83% vs 79%) (Table 1 and Additional file 1: Supplementary Figure S2) but had similar deprivation status overall and throughout the study period (Additional file 1: Supplementary Figure S3).
With respect to clinical conditions, RECOVERY participants had a lower prevalence of comorbidity (median Charlson Comorbidity Score [IQR]: RECOVERY 3.0 [1.0-5.0]vs reference population 4.0 [1.0-6.0]) and were less frail (median Hospital Frailty Risk Score [IQR]: 5.1 [1.8-11.4]vs 6.3 [1.8-16.3])These differences were largely explained by the age structure of the two cohorts, with small differences remaining in the prevalence of some comorbidities, including cardiovascular disease, congestive heart failure, and dementia, after accounting for age (Additional file 1: Supplementary Figures S4-S6).

Outcomes
Overall, the crude all-cause 28-day mortality in RECOV-ERY was 20.6% (95% CI: 20.2-21.0%)and 24.8% (95% CI: 24.6-25.0%) in the reference population, with mortality decreasing substantially in both cohorts from March 2021 onwards.After standardising the RECOV-ERY cohort to the age-sex composition of the national reference population, 28-day mortality in RECOVERY was similar to the reference population (23.7%, 95% CI: 23.3-24.1%;Fig. 3).Age-stratified mortality rates were similar between the two cohorts, with the exception of March-May 2020 where mortality was lower in RECOV-ERY (Additional file 1: Supplementary Figures S7-S8 and Supplementary Table S4).When mortality was assessed separately by comorbidity level and age, the difference in 28-day mortality between the two cohorts in March-May 2020 appeared to be mostly driven by older and more

Discussion
This study compared the characteristics of RECOVERY trial participants with people admitted to hospital due to COVID-19 in England.Our main findings were that RECOVERY participants were generally similar, but slightly younger, less frequently female, and had an overall lower comorbidity and frailty burden, much of which attributable to age differences.After adjustment for age and sex, 28-day mortality in the RECOVERY cohort was similar to that in the wider population of patients admitted to hospital with COVID-19 in England.This pattern was observed throughout the period studied, with the exception of March-May 2020 (corresponding to the first COVID-19 wave in the UK) when, even after adjusting for age and sex, 28-day mortality in RECOVERY was slightly lower than the reference population.The reasons for this are not fully explained by differences in measured frailty or comorbidity as assessed in our analyses and may be attributable to factors not captured in the datasets available in this study.
Older adults are frequently underrepresented in trials [21] and have been excluded from over half of COVID-19 clinical trials and all major vaccine trials [22].Although RECOVERY does not have an upper age limit (and some participants were aged over 100 years old), in our study, RECOVERY participants were on average 3 years younger, with underrepresentation of people aged ≥ 80. RECOVERY participants were also less frequently female (37% vs 45%), but it is not possible to identify the underlying reasons for this in the available data.However, this is similar to results found in other trials and may be due to under-recruitment of older patients (who are more frequently women) [23,24].Of note, we found important differences in recruitment across different geographical regions, with the recruitment ratio (the number of individuals included in RECOVERY divided by the number of individuals in the reference population) ranging from 7.4 to 16.4%.The reasons for this are likely to be complex, including issues related to local research infrastructure and funding, competing studies, demand on local clinical services and clinician and patient willingness to engage with research.Data on these parameters are not available for this study, but these differences merit further investigation.We also found that comorbidity and frailty scores were lower in the RECOVERY cohort compared with the reference population.Most of these differences were attributable to age composition, but within older age groups, comorbidities and the overall frailty risk scores remained slightly higher in the reference population.Clinical decision making about eligibility for randomised trials will inevitably result in differences between the trial cohort and the HES Hospital Episode Statistics, IQR interquartile range, SD standard deviation a Proportions for people with known and unknown geographical region and ethnicity were calculated separately, using the number with known region or ethnicity as the denominator for each category and the entire cohort as the denominator for those with unknown region or ethnicity b ICD-10 codes for AIDS/HIV are censored from HES data target population; however, the proportional estimates of treatment effect from trials are usually generalisable, unless there are substantial differences in the biology of the target disease or the effectiveness of the intervention in the non-trial context [4,5].While crude 28-day all-cause mortality was lower in RECOVERY, age-and sex-adjusted mortality were generally similar, with similar trends in both cohorts over time.The reduction seen from March 2021 onwards, consistent with previous reports [25], may represent the effect of SARS-CoV-2 vaccination uptake, which greatly reduced the likelihood not only of hospital admission but also of death following hospitalisation [26,27].Overall, the absolute effect estimates generated by RECOVERY were generalisable to the national population during the period studied.However, secular trends in mortality rates should be considered and the best estimate of the likely absolute effect size in current clinical practice requires application of the proportional treatment effect from the RECOVERY trial to current absolute event rates among patients hospitalised with COVID-19 [4,5].
Our study has a number of limitations.We were not able to determine baseline respiratory status (which has been shown to be an important determinant of the proportional and absolute benefits of corticosteroid treatment) [28] in our reference cohort, since there was low agreement between respiratory support status extracted from HES alone and that collected in the trial (based on a larger number of linked data sources) and used in published analyses (Additional file 1: Annex IV).We also cannot be certain whether our reference population had clinically significant COVID-19, although we have mitigated this by including only people with a relevant ICD-10 code in the primary diagnostic position.Our analysis was restricted to people admitted in England.Baseline characteristics were similar when comparing RECOV-ERY participants recruited across all UK nations, but may differ from non-UK countries.Finally, our analysis was restricted to the period from March 2020 to November Fig. 3 All-cause 28-day mortality over time in RECOVERY and the reference population.Twenty-eight-day mortality is the proportion of people with death recorded within 28 days of their index date (with 95% confidence intervals included).Adjustment performed by applying RECOVERY 28-day mortality to an age-(5-year bands) and sex-standardised population using the reference population, in a rolling basis within each time period (for 28-day mortality and age and sex breakdown) 2021, due to changes to trial eligibility which could not be replicated in the reference population with the available data.However, recruitment to RECOVERY declined significantly from December 2021 onwards (along with national COVID-19 admissions), so that extending the analysis period to the time of writing would add only a small number of additional deaths (~ 4%), which were unlikely to meaningfully influence interpretation of our results.

Conclusion
The RECOVERY trial recruited a broad patient population that was generally representative of people admitted to hospital due to COVID-19 in England during the same period, with respect to both baseline characteristics and subsequent mortality.Twenty-eight-day mortality declined substantially in both the RECOVERY and reference populations throughout the period studied.Estimates of current mortality rates from healthcare systems data combined with the proportional treatment effects from trials are needed to estimate the likely absolute effects of the treatments tested within current practice.
anonymised English mortality (Civil Registrations) and hospitalisations (HES) data used to derive the reference population can be obtained upon application to NHS England at www. digit al. nhs.uk.The statistical programming code used in this work is available for inspection and reuse at https:// github.com/ gpess oaamo rim/ recov ery-gener aliza bility-repre senta tiven ess.

Fig. 1
Fig. 1 CONSORT diagram depicting the cohort derivation process.' a ' symbol indicates the following: randomised up until 1 September 2022.' b ' symbol indicates the following: index date is the episode start date for the earliest episode with a COVID-19 ICD-10 code in the primary diagnostic position.' c ' symbol indicates the following: up to June 2022 (latest data included in the raw extract)

Fig. 2
Fig. 2 Geographical representativeness of the RECOVERY trial cohort in comparison with the national reference population.Number of RECOVERY participants plotted at the location of the recruiting NHS Trust hospital site.Recruitment ratios were calculated by dividing the number of RECOVERY participants recruited in each region by the number of individuals in the reference population in the same region and are presented by region.The average recruitment ratio across all English regions was 11.1%.There were 1097 and 2409 individuals with missing residential area in HES data in the RECOVERY and the reference population cohort, respectively

Table 1
Baseline cohort characteristics